This notebook is to facilitate collaboration on building and evaluating models to estimate pharmaceutical product expiration date. The notebook contains:
In this example the impurity product, $P$, can either be produced from crystalline drug, $D$, or an amorphous intermediate, $I$, that is more reactive (this example is extracted from Waterman et al. Pharm Res Vol 24 2007 p 783).
Expressing the system in matrix form. \begin{equation} \frac{d}{dt} \begin{bmatrix} P \\ I \\ D \end{bmatrix} = \begin{bmatrix} 0 & k_2 & k_3 \\ 0 & (-k_{1r} - k_2) & k_{1f} \\ 0 & k_{1r} & (-k_{1f}-k_3) \end{bmatrix} \begin{bmatrix} P \\ I \\ D \end{bmatrix} \end{equation}
The rate constants are assumed to follow an Arrhenius temperature dependence: \begin{equation} k = A \exp \left (- \frac{E}{RT} \right ) \end{equation}
Edit the parameters below for the above reaction model.
A : refers to the Arrhenius prefactor ($\textrm{sec}^{-1}$). The subfix is defined by the above model.
E : refers to the Arrhenius activation energy (kcal/mol). The subfix is defined by the above model.
Po, Io, Do : refer to the initial product $P$, intermediate, $I$ and drug, $D$, component concentrations in the above model. The concentrations are defined as the fraction of component relative to the total amount of starting reactant. For example, if the starting formulation has 90% of the drug as crystalline, and 10% amorphous, then the ratios would be $D_o$=0.9, $I_o$=0.1, and $P_o$=0.
Temperatures: refers to the stability testing temperatures.
days: refers to the time points that each sample is measured.
impurity_setpoint: defines the acceptance criteria for the impurity concentration
When finished run menu Cell -> Run Cells. Scroll down to see the update results.
In [1]:
# kinetic parameters (kcal/mol)
A1f = 1e4
E1f = 22
A1r = 1e4
E1r = 26
A2 = 1e6
E2 = 20
A3 = 1e5
E3 = 21
Po = 0
Io = .1
Do = 0.9
# temperatures (up to 4 different temperatures)
Temperatures = [25, 40, 60, 80]
# time points in days
days = [[0, 7, 14], # days at first temperature
[0, 5, 10], # days at second temperature
[0, 3, 7], # days at third temperature
[0, 1, 5]] # days at fourth temperature
# acceptance criteria for impurity concentraiont in fraction of product degredation
impurity_setpoint = 0.05
In [2]:
# Python module imports
import numpy as np
from matplotlib import pyplot as plt
import pandas as pd
from scipy.integrate import odeint
from IPython.display import Image
from IPython.core.display import HTML
from scipy.optimize import minimize
from scipy.optimize import curve_fit
import statsmodels.formula.api as sm
%matplotlib inline
from pdb import set_trace
plt.rcParams.update({'figure.figsize':(8,6),
'figure.titlesize': 16.,
'axes.labelsize':14.,
'xtick.labelsize':12.,
'legend.fontsize': 12.})
In [3]:
# gas constant kcal/mol K
R = 0.00198588
# define the domain for the ODEs solution
ndays = np.max(days) + 365*10
dt = 5000.
t = np.arange(0, ndays*(24*3600), dt)
npts = t.shape[0]
# differential equation solution variable
cols = [(T, C) for T in Temperatures for C in ['P','I','D']]
concentrations = pd.DataFrame({col: np.zeros(t.shape[0]) for col in cols})
concentrations['t'] = t
# experimental results
iterables = [(T, d) for i, T in enumerate(Temperatures) for d in days[i]]
index = pd.MultiIndex.from_tuples(iterables, names=['Temperature', 'days'])
results = pd.DataFrame(data=np.zeros(12), index=index, columns=['P'])
# predicted times
predictions = pd.DataFrame(data=np.zeros(len(Temperatures)), index=Temperatures, columns=['Theory'])
# calculate rate constants from the user defined Arrhenius values
def Arrhenius(A, E, T):
return A*np.exp(-E/(R*T))
# calculate the rate constants for each user defined temperature
k1f = [Arrhenius(A1f, E1f, T+273.) for T in Temperatures]
k1r = [Arrhenius(A1r, E1r, T+273.) for T in Temperatures]
k2 = [Arrhenius(A2, E2, T+273.) for T in Temperatures]
k3 = [Arrhenius(A3, E3, T+273.) for T in Temperatures]
# definition of the derivative of the concentrations vector variable.
def deriv(y, t, k1f, k1r, k2, k3):
P, I, D = y
dPdt = k2*I + k3*D
dIdt = (-k1r-k2)*I + k1f*D
dDdt = k1r*I + (-k1f-k3)*D
dydt = [dPdt, dIdt, dDdt]
return dydt
# initial conditions (user defined)
yo = [Po, Io, Do]
# call solver for each user defined temperature
for i, T in enumerate(Temperatures):
yt = odeint(deriv, yo, t, args=(k1f[i], k1r[i], k2[i], k3[i]))
for j, C in enumerate(['P', 'I', 'D']):
concentrations[T,C] = yt[:,j]
# find the impurity concentration at the user defined time points
index = (np.array(days)/float(ndays)*npts).astype('int')
for i, T in enumerate(Temperatures):
for j, d in enumerate(days[i]):
results.loc[T,d].P = concentrations[T,'P'].iloc[index[i,j]]
# find the time that the reaction will produce the user-defined impurity concentration
for T in Temperatures:
predictions.loc[T].Theory = t[np.argmin(np.abs(concentrations[T,"P"] - impurity_setpoint))]/(3600*24*365)
In [4]:
t_days = t / (24*3600)
max_days_index = np.argmin(np.abs(t_days - (np.max(days)+1)))
t_days = t_days[:max_days_index]
c = concentrations.iloc[:max_days_index]
max_conc = c.iloc[:, c.columns.get_level_values(1)=='P'].max().max()
fig = plt.figure(figsize=(14,10))
for i, T in enumerate(Temperatures):
ax = fig.add_subplot(2,2,i+1)
ax.plot(t_days, c[T,'P'], color='red', label='P')
ax.plot(days[i], results.loc[T].P, ls="none", marker='o', color="red")
ax2 = ax.twinx()
ax2.plot(t_days, c[T,'I'], color='green', label='I')
ax2.plot(t_days, c[T,'D'], color='blue', label='D')
ax.set_ylim([0, max_conc])
ax2.set_ylim([0,1])
ax.set_xlabel("time (days)")
ax.set_ylabel("impurity concentration (%)")
ax2.set_ylabel("intermediate and drug concentration (%) ")
ax.legend(['P'], loc="upper left")
ax2.legend(loc="upper right")
ax.set_title("%s C"%(Temperatures[i]))
Concentration profiles: The solid lines are the exact concentrations of the components $P$, $I$, $D$ as a function of time calculated from the defined reaction parameters. The filled symbols denote the experimental time points (defined by the user in the parameter list).
In [5]:
results
Out[5]:
Theoretical concentration measurements: The above tabulates the theoretical impurity concentrations, $P$, at the measurement time points.
In [6]:
t_days = t / (24*3600)
max_conc = concentrations.iloc[:, concentrations.columns.get_level_values(1)=='P'].max().max()
fig = plt.figure(figsize=(14,10))
for i, T in enumerate(Temperatures):
ax = fig.add_subplot(2,2,i+1)
ax.plot(t_days, concentrations[T,'P'], color='red', label='P')
ax.plot(t_days, [impurity_setpoint]*t_days.shape[0], color='red', ls=':')
ax.plot(days[i], results.loc[T].P, ls="none", marker='o', color="red")
ax.set_ylim([0, max_conc])
ax.set_xlabel("time (days)")
ax.set_ylabel(r"impurity concentration (%)")
ax.legend(['P'], loc="upper left")
ax.set_title("%s C"%(Temperatures[i]))
ax2 = ax.twinx()
ax2.plot(t_days, concentrations[T,'I'], color='green', label='I')
ax2.plot(t_days, concentrations[T,'D'], color='blue', label='D')
ax2.set_ylim([0,1])
#ax2.set_ylabel(r"drug concentration")
ax2.legend(loc="upper right")
Concentration profiles: The plots above show the concentration profiles over a duration sufficient that the impurity concentrations reaches the user defined set point (denoted by the red dashed line).
In [7]:
predictions
Out[7]:
Shelf life: The above tabulates the theoretical shelf life in years calculated from the user defined kinetic parameters.
The actual reaction mechanism is unknown in advance, thus a simple reaction mechanism is assumed.
The time is then calculated as:
\begin{equation}
t = \frac{P}{\hat{A} \exp(-\frac{\hat{E}}{RT})}
\end{equation}
In [8]:
fig = plt.figure(figsize=(8,6))
ax = fig.add_subplot(111)
r = results.iloc[results.index.get_level_values(1)!=0].copy()
r['y'] = np.log(np.divide(r['P'].values, r.index.get_level_values(1).astype('f8').values*24*3600+1))
r['x'] = 1./(r.index.get_level_values(0).astype('f8').values + 273)
ax.plot(r.x, r.y, ls='none', marker='o')
ax.set_xlabel(r'$1/T$')
ax.set_ylabel(r'$\ln (P/t)$')
reg = sm.ols(formula="y ~ x", data=r).fit()
pred = reg.params[1]*r.x + reg.params[0]
ax.plot(r.x, pred)
A = np.exp(reg.params[0])
E = -1*reg.params[1]*R
print "regression fit for A = %s "%(A)
print "regression fit for E = %s"%(E)
Regression analysis: The above figure plots the logarithm of the measured impurity concentration divided by time, $\ln (P/t)$, against the reciprical temperature, $1/T$ (filled circles), which would be linear for a zero-order kinetics. The line is the result of a linear fit to the data.
In [9]:
print reg.summary()
In [10]:
temp = R*(predictions.index.astype('f8').values+273)
k = A * np.exp(-E/temp)
predictions['zero_order'] = impurity_setpoint / k * 1/(3600*24*365)
In [11]:
predictions
Out[11]:
Table of predicted results The above table compares the theoretical shelf-life in years (calculated from the exact reaction mechanism and user defined parameters) to the predicted shelf-life assuming zero order kinetics.
Assuming a reaction mechanism as:
One method to obtain the Arrhenius parameters is to a stepwise calculation. First, the impurity concentration experiments conducted at the same temperature, $T$, the kinetic parameter, $k$, at the temperature $T$ can be obtained by plotting log of the drug concentration over time (recall that $P = D_{To} - D$). \begin{eqnarray} D_{To} - D = D_{To} (1-e^{-k t}) \\ D = D_{To} e^{-k t} \\ \ln D = \ln D_{To} - k t \\ \ln \left ( \frac{D}{D_{To}} \right ) = - k t \end{eqnarray}
In [12]:
D = Do + Io - results['P']
data = np.log(D/(Do+Io)).to_frame('ln(D/Do)')
data['t'] = data.index.get_level_values(1)*24*3600
data['k'] = [0]*data['t'].shape[0]
colors = ['blue','green','red','orange']
fig = plt.figure(figsize=(8,6))
ax = fig.add_subplot(111)
for i, T in enumerate(results.index.levels[0]):
ax.plot(data.loc[T]['t'], data.loc[T]['ln(D/Do)'], ls='none', marker='o', color=colors[i])
slope = sm.OLS(data.loc[T]['ln(D/Do)'], data.loc[T]['t']).fit().params[0]
data.loc[(T,slice(None)),'k'] = -1 * slope
ax.plot(data.loc[T]['t'], data.loc[T]['t']*slope,
ls='--', color=colors[i], label="%.2e 1/s"%(-1*slope))
ax.set_ylabel(r'ln $\left ( D/D_{To} \right )$')
ax.set_xlabel(r'$t$ (sec)')
plt.legend(loc='best')
Out[12]:
Figure $\ln \left ( \frac{D}{D_o} \right ) ~\mathrm{vs}~t$: to estimate the the rate constant at each experimental temperature, which is subsequently used to estimate the activation energy and collision factor for initial guess in the non-linear regression.
In [13]:
data
Out[13]:
Table of the kinetic parameter, $k$ regressed from data at the different experimental temperatures.
Once the kinetic parameter, $k$, is estimated at the different temperatures, the Arrenius parameters, $A$ and $E$, can be obtained by plotting the logarithm of the rate constant at each experimental temperature against the reciprical temperature. \begin{equation} \ln k = \ln A - \frac{E}{R} \frac{1}{T} \end{equation}
In [14]:
df = pd.DataFrame({'x': 1./(data.index.levels[0]+273).values, 'y': np.log(data.loc[(slice(None),0),'k'])})
reg = sm.ols(formula='y~x', data=df).fit()
A = np.exp(reg.params['Intercept'])
E = -1 * R * reg.params['x']
fig = plt.figure(figsize=(8,6))
ax = fig.add_subplot(111)
ax.plot(df.x, df.y, marker='o', ls='none', color='blue', label='data')
ax.plot(df.x, df.x*reg.params['x'] + reg.params['Intercept'], ls="--", color='blue', label='fit')
ax.set_xlabel(r"$1/T$ (1/K)")
ax.set_ylabel(r"log $k$")
ax.legend()
print "regression fit for A = %.2e"%A
print "regression fit for E = %.2f"%E
Figure $\ln k ~\mathrm{vs}~\frac{1}{T}$: to estimate the activation energy and the collision factor to use as initial guesses in the subsequent nonlinear regression.
The estimated Arrhenius parameters can now be used to estimate shelf-life.
In [15]:
temp = R*(predictions.index.astype('f8').values+273)
k = A * np.exp(-E/temp)
predictions['first_order'] = impurity_setpoint / k * 1/(3600*24*365)
predictions
Out[15]:
According to a paper by Fung (Statistical prediction of drug stability based on nonlinear parameter estimation, Fung et al Journal Pharm Sci Vol 73 No 5 pg 657-662 1984), improved confidence can be obtained by performing a non-linear regression of all the data simuntaneously, instead of the stepwise approach. In Fung's paper, they pre-assumed a shelf-life defined by 10% degredation of the product, which simplifies the estimation. However, the regression done here will be performed without the assumption.
Thus, the Arrhenius parameters are found by finding the roots to the above equation, where $A$ and $E$ are the variables.
In [16]:
D = Do + Io - results['P'].values
df = pd.DataFrame(np.log(D/(Do+Io)), columns=['D*'])
df['t'] = data.index.get_level_values(1)*24*3600
df['T'] = data.index.get_level_values(0) + 273.
df = df.loc[df['t']!=0]
def error(p):
A = p[0]
E = p[1]
return np.sum((df['D*']/df['t'] + A * np.exp(-1. * E/(R*df['T'])))**2)
The optimization will be performed over a range of initial conditions to inspect convergence problems.
In [17]:
xo = np.array([A,E])
opt = []
for d in np.arange(.1,1.5,.1):
opt.append(minimize(error, xo*d, tol=1e-30))
Plot the square errors for a series of initial conditions.
In [43]:
e2 = np.array([error(r.x) for r in opt])
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(e2)
ax.set_ylabel(r"$\sum \mathrm{error}^2$")
ax.set_xlabel("initial condition index")
A, E = (opt[e2.argmin()]).x
print "optimal A = %.2e 1/s" %A
print "optimal E = %.2f kcal/mol K" %E
Figure of square error indicates that the objective function likely has some very shallow gradients near the minimum, thus inhibiting convergence to a unique value.
In [44]:
temp = R*(predictions.index.astype('f8').values+273)
k = A * np.exp(-E/temp)
predictions['first_order_nonlinear'] = impurity_setpoint / k * 1/(3600*24*365)
predictions
Out[44]:
Table of precition comparison suggests that zero_order was actually the most accurate of the models. It should be noted that the convergence of the non-linear regression is questionable. The under prediction of shelf-life is to be expected, because of the faster reaction of the amorphous form that gets consumed early in the process.